Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release OpenMP resources in blas_thread_shutdown #4080

Draft
wants to merge 3 commits into
base: develop
Choose a base branch
from

Conversation

aitap
Copy link

@aitap aitap commented Jun 10, 2023

OpenMP 5.0 gained an omp_pause_resource_all function designed to release the locks before forking the process. The parameters are described in the omp_pause_resource function. Using this function makes it possible for OpenMP builds of OpenBLAS to pass fork safety tests.

An important detail is that there exist OpenMP implementations that only claim OpenMP 4.5 compatibility (i.e. they #define _OPENMP 201511) while they do have the omp_pause_resource_all. We currently don't have a way to use omp_pause_resource_all with such implementations (e.g. GCC 10 on Debian 11).

The first commit adds the omp_pause_resource_all call to blas_thread_shutdown. You may want not to merge this if you'd like blas_thread_shutdown to be safe to call while other OpenMP operations are in progress.

The second commit adds a convoluted way to test fork safety of the resulting build on OpenMP ≥ 5.0 in addition to non-OpenMP builds. I'd be glad to see a better way of testing for OpenMP ≥ 5.0 in a Makefile.

@martin-frbg
Copy link
Collaborator

Thanks - I have not had time to review in detail, but perhaps it might be better to put the OpenMP version test in the c_check script rather than getarch

@aitap aitap force-pushed the omp_pause_resource branch from 570d6ba to 645eabe Compare June 11, 2023 07:26
@aitap
Copy link
Author

aitap commented Jun 11, 2023 via email

@aitap aitap marked this pull request as draft February 19, 2025 13:28
Assume that the ctest3.c program successfully compiling and linking
means that the function is available.
OpenMP 5.0 introduced the function omp_pause_resource_all that instructs
the runtime to "relinquish resources used by OpenMP on all devices". In
practice, these resources include the locks that would otherwise trip up
the runtime after a fork(). Releasing these resources in a function
called by pthread_atfork() makes it possible for the child process to
continue functioning after the runtime automatically re-acquires its
resources.

Thread safety: blas_thread_shutdown doesn't check whether there are
other BLAS operations running in parallel, so this isn't any less safe
than before with respect to OpenBLAS function calls. On the other hand,
if there are other OpenMP operations in progress, asking the runtime to
pause may result in unspecified behaviour. A hard pause is allowed to
deallocate threadprivate variables too.
In addition to testing fork safety on non-OpenMP builds, test it when
omp_pause_resource_all() is available to release the locks.
@aitap aitap force-pushed the omp_pause_resource branch from 645eabe to 806073c Compare February 19, 2025 15:54
@martin-frbg
Copy link
Collaborator

Sorry, must admit that I lost track of this... I'm not entirely sure of the binding region for this operation, what happens if this gets called while OpenBLAS is used by a program that uses multiple OpenMP threads itself (would there be a need for an if (!omp_in_parallel) guard or something like that) ?

@aitap
Copy link
Author

aitap commented Feb 20, 2025

Good question. The binding region for omp_pause_resource_all is indeed the whole program, and unspecified behaviour is allowed to occur if calling this when an explicit task (in a different thread) has not finalized execution. If another thread is running an OpenMP computation, omp_pause_resource_all() called from the pthread_atfork() handler is allowed to break it.

Unfortunately, OpenMP provides no other API to stop and restart the worker threads, and this function must be called before fork(), not afterwards (unlike the libgomp patch that detects being forked post-factum and only restarts the threads in the child process).

Experimentally, the following program seems to behave fine with omp_pause_resource_all and break without it, but that's not enough for an argument for enabling this everywhere:

ex.c
#ifndef _OPENMP
#error Please compile me with OpenMP enabled
#endif

#include <omp.h>
#include <pthread.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

enum { OMP_THREADS=3 };

pthread_barrier_t barrier1, barrier2;

static void * do_fork(void * unused) {
	int sum = 0;
	puts("In child, before OpenMP loop1");
	#pragma omp parallel for num_threads(OMP_THREADS) reduction(+:sum)
	for (int i = 0; i < 100; ++i) {
		sum += i;
	}
	printf("In child, OpenMP loop1 done (%d)\n", sum);

	pthread_barrier_wait(&barrier1);
#ifndef OMIT_PAUSE_RESOURCE_ALL
	omp_pause_resource_all(omp_pause_hard);
#endif
	pid_t ret = fork();
	if (ret < 0) abort();
	if (ret > 0) {
		pthread_barrier_wait(&barrier2);
		return NULL;
	}
	puts("In child, before OpenMP loop2");
	sum = 0;
	#pragma omp parallel for num_threads(OMP_THREADS) reduction(+:sum)
	for (int i = 0; i < 100; ++i) {
		sum += i;
	}
	printf("In child, OpenMP loop2 done (%d)\n", sum);
	exit(0);
}


int main() {
	if (pthread_barrier_init(&barrier1, NULL, OMP_THREADS+1)) abort();
	if (pthread_barrier_init(&barrier2, NULL, OMP_THREADS+1)) abort();

	pthread_t t;
	int ret = pthread_create(&t, NULL, do_fork, NULL);
	if (ret) abort();

	puts("Starting parent loop");

	bool waited1=false, waited2=false;
	int sum = 0;
	#pragma omp parallel for num_threads(OMP_THREADS) private(waited1, waited2) reduction(+:sum)
	for (size_t i = 0; i < OMP_THREADS*10; ++i) {
		//do_fork(NULL);
		if (!waited1) {
			pthread_barrier_wait(&barrier1);
			waited1=true;
		}
		sum += i;
		if (!waited2) {
			pthread_barrier_wait(&barrier2);
			waited2=true;
		}
	}

	printf("Parent loop1 done (%d), attaching child thread...\n", sum);
	ret = pthread_join(t, NULL);
	printf("...ret=%d\n", ret);

	sum = 0;
	#pragma omp parallel for num_threads(OMP_THREADS) private(waited1, waited2) reduction(+:sum)
	for (size_t i = 0; i < OMP_THREADS*10; ++i) {
		sum += i;
	}

	printf("Parent loop2 done (%d)\n", sum);
}
gcc -g -fopenmp -o ex ex.c && ./ex
# Starting parent loop
# In child, before OpenMP loop1
# In child, OpenMP loop1 done (4950)
# Parent loop1 done (435), attaching child thread...
# ...ret=0
# Parent loop2 done (435)
# In child, before OpenMP loop2
# In child, OpenMP loop2 done (4950) # <-- child process succeeds using OpenMP
gcc -g -DOMIT_PAUSE_RESOURCE_ALL -fopenmp -o ex ex.c && ./ex
# Starting parent loop
# In child, before OpenMP loop1
# In child, OpenMP loop1 done (4950)
# Parent loop1 done (435), attaching child thread...
# ...ret=0
# Parent loop2 done (435)
# In child, before OpenMP loop2
# # <-- child process hung in libgomp initializing for the second loop

Back to the drawing board? Detect being a fork() child around here:

if (unlikely(blas_server_avail == 0)) blas_thread_init();

...and fall back to sequential operation instead of calling into OpenMP runtime and potentially hanging?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants